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We address the role of noise and the issue of efficient computation in stochastic optimal control 
problems. We consider a class of non-linear control problems that can be formulated as a path 
integral and where the noise plays the role of temperature. The path integral displays symmetry 
breaking and there exist a critical noise value that separates regimes where optimal control yields 
qualitatively different solutions. The path integral can be computed efficiently by Monte Carlo inte- 
gration or by Laplace approximation, and can therefore be used to solve high dimensional stochastic 
control problems. 
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Optimal control of non-linear systems in the presence 
of noise is a very general problem that occurs in many 
areas of science and engineering. It underlies autonomous 
system behavior, such as the control of movement and 
planning of actions of animals and robots, but also for 
instance the optimization of financial investment policies 
and control of chemical plants. The problem is simply 
stated: given that the system is in this configuration at 
this time, what is the optimal course of action to reach 
a goal state at some future time. The cost of each time 
course of actions consists typically of a path contribution, 
that specifies the amount of work or other cost of the 
trajectory, and an end cost, that specifies to what extend 
the trajectory reaches the goal state. 

In the absence of noise, the optimal control problem 
can be solved in two ways: using the Pontryagin Mini- 
mum Principle (PMP) [lj which is a pair of ordinary dif- 
ferential equations that are similar to the Hamilton equa- 
tions of motion or using the Hamilton- Jacobi-Bcllman 
(HJB) equation, which is a partial differential equation 

In the presence of (Wiener) noise, the PMP formalism 
is replaced by a set of stochastic differential equations 
which become difficult to solve (see however |3|). The in- 
clusion of noise in the HJB framework is mathematically 
quite straight-forward, yielding the so-called stochastic 
HJB equation p|. Its solution , however, requires a dis- 
cretization of space and time and the computation be- 
comes intractable in both memory requirement and CPU 
time in high dimensions. As a result, deterministic con- 
trol can be computed efficiently using the PMP approach, 
but stochastic control is intractable due to the curse of 
dimensionality. 

For small noise, one expects that optimal stochastic 
control resembles optimal deterministic control, but for 
larger noise, the optimal stochastic control can be en- 



tirely different from the deterministic control but 
there is currently no good understanding how noise af- 
fects optimal control. 

In this paper, we address both the issue of efficient 
computation and the role of noise in stochastic optimal 
control. We consider a class of non-linear stochastic con- 
trol problems, that can be formulated as a statistical me- 
chanics problem. This class of control problems includes 
arbitrary dynamical systems, but with a limited control 
mechanism. It contains linear-quadratic Q control as a 
special case. We show that under certain conditions on 
the noise, the HJB equation can be written as a linear 
partial differential equation 



-dtip = Hip 



(1) 



with H a (non-Hermitian) operator. Eq^must be solved 
subject to a boundary condition at the end time. As 
a result of the linearity of Eq. the solution can be 
obtained in terms of a diffusion process evolving forward 
in time, and can be written as a path integral. The path 
integral has a direct interpretation as a free energy, where 
noise plays the role of temperature. 

This link between stochastic optimal control and a free 
energy has two immediate consequences. 1) Phenomena 
that allow for a free energy description, typically dis- 
play phase transitions. We argue that for stochastic op- 
timal control one can identify a critical noise value that 
separates regimes where the optimal control is qualita- 
tively different and illustrate this with a simple exam- 
ple. 2) Since the path integral appears in other branches 
of physics, such as statistical mechanics and quantum 
mechanics, we can borrow approximation methods from 
those fields to compute the optimal control approxi- 
mately. We show how the Laplace approximation can 
be combined with Monte Carlo sampling to efficiently 
compute the optimal control. 

Let x be an n-dimensional stochastic variable that is 
subject to the stochastic differential equation 
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dx = (b(x, t) + u)dt + d£ 



(2) 
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with dt; a Wiener process with (d£id£j) — Vijdt, and 
Uij independent of x, u, t. b(x, t) is an arbitrary n- 
dimensional function of x and t, and u an n-dimensional 
vector of control variables. Given x at an initial time 
t, the stochastic optimal control problem is to find the 
control path u(-) that minimizes 



and assume there exists a scalar A such that 
XSij = (Ri>)ij 



(8) 



with Sij the Kronecker delta. In the one dimensional 
case, such a A can always be found. In the higher di 



mcnsional case, this restricts the matrices R oc v 



C(x,t,u(-)) = 



f / i 

dr I -u(t) t Ru(t) + V{x{t),t) 



(3) 



El 

Eq. |H1 reduces the dependence of optimal control on the 
rt-dimensional noise matrix to a scalar value A that will 
play the role of temperature. Eq. [5] reduces to the linear 
equation ^ with 



with R a matrix, V(x, t) a time-dependent potential, and 
4>(x) the end cost. The brackets ()- denote expectation 
value with respect to the stochastic trajectories that 
start at x. 

One defines the optimal cost-to-go function from any 
time t and state x as 



H 
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-Tt{uV 2 



(9) 



Let p(y,r\x,t) with p(y,t\x,t) — 5{y — x) describe a 
diffusion process for r > t defined by the Fokker-Planck 
equation 



J(x,t) = mmC(x,t,u(-)). 
«(■) 



(4) 



drP = HU 



J satisfies the stochastic HJB equation which takes the 
form 

-d t J = minf \u T TLu + V +{b + u) T VJ +\ r lY(vV 2 J) 

U \ 2 2 



-^-^(M + ^Tr^V (10) 
with TJt the Hermitian conjugate of H . Then A(t) = 
J dyp(y,T\x,t)ijj(y,T) is independent of r and in partic- 
ular A(t) = A(t/). It immediately follows that 



ip{x,t) 



dyp(y,tf\x,t)exp(-<p(y)/\) (11) 



~(V J) T R _1 V J + V + J + -Tr {vV 2 f 



with Tr(i/V 2 J) = Y^ij ^ijd 2 J/dxidxj and 
u= -R _1 VJ(f,t) 



(6) 



the optimal control at x, t. The HJB equation is non- 
linear in J and must be solved with end boundary con- 
dition J(x,tf) = <p(x). 

Define tp(x,t) through 0] 



We arrive at the important conclusion that ip(x,t) can 
be computed either by backward integration using Eq. 
(5<>r by forward integration of a diffusion process given by 

Eq.nni 

We can write the integral in Eq.^Jas a path integral. 
We use the standard argument and divide the time 
interval t — ► tf in n\ intervals and write p(y,tf\x,t) — 
n™=i p(%ii ti\xi-i,U-i) and let n\ — > oo. The result is 



(12) 



i/j(x,t) = j[dx}sex.p[-j-S(x(t^tf)) 



J(x,t) = —Xlogt/j(x,t) 



(7) 



with J[dx]g an integral over all paths x(t — > tf) that start 
at x and with 



S(*(t ->*/)) = cj>(x(t f ) + 



dr - 



1 fdx(r) 



dr 



b(x( T ),r) 
I 



R(^-6(£(T),r)) HVlrlr] | (181 



the Action associated with a path. From Eas. 171 and 1121 in which particles get annihilated at a rate V(x, t)/A: 
the cost-to-go J(x, t) becomes a log partition sum (ie. a 

free energy) with temperature A. _ . , ,i» . , , , .,. _ _ _ . . 

; x = x + b(x, t)dt + d£, with probability 1 - Vdt/X 

x = f, with probability V^di/ A (14) 



The path integral Ea.ll2lcan be estimated bv stochas- where f denotes that the particle is taken out of the 
tic integration from t to tf of the diffusion process Ea. llUI simulation. Denote the trajectories by x a (t — > tf),a = 
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1, . . . , N. Then, ip(x, t) and u are estimated as 

V>(£,t) = ^2 w a (15) 

a £ alive 

l N 

udt = — V] w a d£ a {t) (16) 

t- v ; y a £ alive 

w a = ^exp(-(f>(x a (tf))/\) 

where 'alive' denotes the subset of trajectories that do 
not get killed along the way by the f operation. The nor- 
malization l/N ensures that the annihilation process is 
properly taken into account. Ea. 1161 states that optimal 
control at time t is obtained by averaging the initial di- 
rections of the noise component of the trajectories d£ a (t), 
weighted by their success at tf. 

The above sampling procedure can be quite ineffi- 
cient, when many trajectories get annihilated. One of 
the simplest procedures to improve it is by importance 
sampling. We replace the diffusion process that yields 
p(y,tf\x,t) by another diffusion process, that will yield 
p'(y,t f \x,t) = exp(-S"/A). Then Ea. ITU becomes. 

il){x,t) = J[dx} s exp{-S'/X)cxp(-(S - S')/X) 

The idea is to chose p' such as to make the sampling 
of the path integral as efficient as possible. Here, we 
use the Laplace approximation, which is given by the k 
deterministic trajectories xp(t — > tf) that minimize the 
Action 

k 

J(x,t) « -Alog^expt-S^t^t/VA) (17) 

0=1 

The Laplace approximation ignores all fluctuations 
around the modes and becomes exact in the limit A — > 0. 
The Laplace approximation can be computed efficiently, 
requiring 0(n 2 m 2 ) operations, where m is the number of 
time discretization. 

For each Laplace trajectory, we define a diffusion pro- 
cesses p'p according to Eq. ^Jwith b(x,t) — xp(t). The 
estimators for tp and u are given again by Eas. ll5l and ll6l 
but with weights 

w a = i exp ( (S(x a (t ► tf)) S'p{x a (t ► t/))) /A) . 

(18) 

S is the original Action Eq. El and S'p is the new Action 
for the Laplace guided diffusion. When there are multiple 
Laplace trajectories one should include all of these in the 
sample. 

We give a simple one-dimensional example of a double 
slit to illustrate the effectiveness of the Laplace guided 
MC method and to show how the optimal cost-to-go un- 
dergoes symmetry breaking as a function of the noise. 
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FIG. 1: A double slit is placed at t = 1 with openings at 
—6 < x < —4 and 6<a;<8. V = oo for t = 1 outside the 
openings, and zero otherwise. Also shown are two example 
trajectories under optimal control. 
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FIG. 2: Comparison of Laplace approximation (dotted line) 
and Monte Carlo importance sampling (solid jagged line) of 
J(x, t = 0) with exact result (solid smooth line) for the double 
slit problem. The importance sampler used iV = 100 trajec- 
tories for each x. R — 0.1, v = 1, dt = 0.02. 

Consider a stochastic particle that moves with constant 
velocity from f = 0tof/ = 2in the horizontal direction 
and where there is deflecting noise in the x direction: 

dx = udt + d£ 

The cost is given by Eq. [3] with <j>(x) = \x 2 and V(x, ti) 
implements a slit at an intermediate time t\ = 1 (Fig. TTjl . 
Solving the cost-to-go by means of the forward compu- 
tation using Eq. 1111 can be done in closed form. The 
exact result, the Laplace approximation Eq. El an d the 
Laplace guided importance sampling result using Eq. El 
are plotted for t — as a function of x in Fig. EI For each 
x, the Laplace approximation consists of the two deter- 
ministic trajectories, each being piecewise linear, starting 
at t = in x and ending at t — 2 in x — 0. We see 
that the Laplace approximation is quite good for this ex- 
ample, in particular when one takes into account that a 
constant shift in J does not affect the optimal control. 
The MC importance sampler has maximal error of order 
0.1 and is significantly better than the Laplace approx- 
imation. Naive MC sampling using Eq. El ( n °t shown) 
fails for this problem, because most trajectories get killed 
by the infinite potential. Numerical simulations using 
N = 100000 trajectories yield estimation errors in J up 
to approximately 6 for certain values of x. 
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FIG. 3: Symmetry breaking in J as a function of T implies 
a 'delayed choice' mechanism for optimal stochastic control. 
When the target is far in the future, the optimal policy is to 
steer between the targets. Only when T < 1/v should one 
aim for one of the targets. Sample trajectories (top row) and 
controls (bottom row) under stochastic control (left column) 
and deterministic control (right column), v = R = 1, ti = 2. 



We show an example how optimal stochastic control 
exhibits spontaneous symmetry breaking. For two slits 
of width e at x = ±1, the cost-to-go becomes to lowest 
order in e: 



R (l x\ 
J(x, t) = — [ -x 2 - z/T log 2 cosh — + const 



t < ti 



where the constant diverges as O(loge) independent of x 
and T = t\ — t the time to reach the slits. The expression 
between brackets is a typical free energy with inverse 
temperature [3 = 1/vT. It displays a symmetry breaking 
at vT — 1. The optimal control is given by the gradient 
of J: 



(19) 



For T > 1/v (far in the past) optimal control steers to- 
wards x = (between the targets) and delays the choice 
which slit to aim for until later. The reason why this is 
optimal is that the expected diffusion alone of size spvT is 
likely to reach any of the slits without control (although 
it is not clear yet which slit). Only sufficiently late in 
time (T < 1/v) should one make a choice. 

Figure[3]depicts two trajectories and their controls un- 
der stochastic optimal control (Eq. I19fl and deterministic 
optimal control fEa. 1191 with v = 0), using the same real- 
ization of the noise. Note, that at early times the deter- 
ministic control drives x away from zero whereas in the 
stochastic control drives x towards zero and is smaller in 
size. The stochastic control delays the choice for which 
slit to aim until T«l. 

In summary, we have shown that stochastic optimal 
control involves symmetry breaking with qualitatively 
different solutions for high and low noise levels. This 
property is expected to be true also for more general 
stochastic control problems. The path integral formu- 
lation allows for an efficient solution of the HJB equa- 
tion because it replaces the intractable n-dimensional nu- 
merical integration by a Monte Carlo sampling, which is 
known to be often much more efficient. This approach 
will thus be of direct practical value for the control of high 
dimensional, strongly non-linear, systems, such as for in- 
stance robot arms, navigation of autonomous systems, 
and chemical reactions. For realistic applications, naive 
sampling should be replaced by more advanced sampling 
schemes, such as importance sampling or a Metropolis 
method, and should be combined with efficient discretiza- 
tion such as splines, wavelets or a Fourier basis @, @- 
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